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Abstract — We propose a traffic congestion estimation system 
based on unsupervised on-line learning algorithm. The system 
does not rely on background extraction or motion detection. It 
extracts local features inside detection regions of variable size 
which are drawn on lanes in advance. The extracted features 
are then clustered into two classes using K-means and Gaussian 
Mixture Models(GMM). A Bayes classifier is used to detect 
vehicles according to the previous cluster information which 
keeps updated whenever system is running by on-line EM 
algorithm. Experimental result shows that our system can be 
adapted to various traffic scenes for estimating traffic status. 

I. Introduction 

As vision-based traffic monitoring systems become more 
and more prevalent due to their low-cost and easy-to-deploy 
aspects, research on application of computer vision to traffic 
state measurement attracts more interest than before. Although 
the history of traffic monitoring system can be at least traced 
back to 199lfl| and new approaches are emerging|2|, various 
problems are still to be tackled. Situation is much worse in 
urban areas, where congestion most likely occurs, especially at 
the intersections of downtowns. Monitoring the urban scenes 
and highways has been extensively studied. Most of the 
approaches are based on background extraction (or motion 
detection) and vehicle tracking 1 3] f4lf5lf6|. These approaches 
are common and prevalent because 1) background can be 
easily extracted because of light traffic condition. 2) Tracking 
is straightforward as targets are readily segmented. 3) Trajecto- 
ries will be available for even analysis such as retrogradation, 
aberrance, clash, etc. 

However, research on highways and suburban areas cannot 
necessarily apply to urban areas, in which there are two main 
problems need to be solved if we rely on vehicle tracking. One 
is that background rarely reveals in most situations of urban 
environment. For example, at the intersections, the whole lanes 
may be dominated by vehicles in waiting queue for a very 
long time as vehicles go and come. The other reason is that, 
tracking is difficult (if not impossible) because of the difficulty 
in vehicle segmentation when traffic jam happens, even though 
the background information may be still pure. In this case, 
vehicles are so heavily (partial or total) occluded that most 
of the tracking strategies fail to work because of difficulty in 
segmentation!?]. Blob trackerfSl is likely to link two or more 
vehicles together either by region or by silhouette|8|. Point 
tracker O (91 may falsely group features from different targets. 
Instead of taking these approaches which rely on background 
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Fig. 1. Block diagram of our traffic state monitoring system.The blue arrows 
show the learning path whereas the red arrows show the classification path 



and segmentation, we propose a novel approach to measure 
traffic state at the congested intersections. 

Our system does not involve background extraction and 
vehicle segmentation. Instead, we classify vehicles from lanes 
directly through an online learning scheme and Bayes classi- 
fier. The block diagram of our system is showed in figure [T] In 
the ROI selection phase, lanes are divided into blocks. Features 
are extracted from these blocks and fed into a learning model, 
which consists of an unsupervised clustering phase in the 
initialization stage and an on-line expectation-maximum (EM) 
learning phase that runs after the initialization. Considering 
that the initialization stage should be short in duration and the 
on-line learning rate is relatively much lower, we do not merge 
them into one, although it is technically possible to do that. 

The rest of this paper is organized as follows: Section II is 
a brief introduction to the related work. Section III discusses 
ROI section. Section IV focuses on Feature selection. In 
Section V we will talk about Clustering and On-line learning. 
Section VI is Bayesian classification. The experimental result 
will given in section VII followed by a conclusion section. 

II. RELATED WORK 

ZaninI 10| proposed a vehicle queue detection scheme based 
on background subtraction and movement analysis. The need 
for background information made it difficult to handle usu- 
ally congested scenes like intersection. Porikli|ll| Proposed 
traffic congestion estimation scheme using Gaussian Mixture 
Hidden Markov models. They extracted features from MPEG 
compressed domain and trained a set of GM-HMM to estimate 



i 


I; 


^£^^fiS5j 


1 




1 


4^ 




# 




Fig. 2. ROI demonstration. Rectangles of Various size are used to approxi- 
mate trapezoidal lane. Each Rectangle is treated as an independent sub-image 
in later processing. 



traffic state. Kato| 12| used HMM-based segmentation method 
to classify objects in traffic scenes as shadows, foreground or 
background objects. Our approach do not rely on background 
extraction and motion analysis, which means that it can work 
well even if the vehicle queue is completely stop for a long du- 
ration. Another difference is that we divide lanes into blocks, 
and detect vehicles inside the blocks. This will eliminate the 
necessity for vehicle segmentation and the measurement for 
the length of vehicle queue is as easy as counting the blocks 
that contain vehicles. 

III. ROI SELECTION 

The ROI selection scheme is pretty simple and straightfor- 
ward here. As the length of the waiting queue is the most 
important parameter at the intersection, we need to detect the 
existence of vehicles both near the pavement and at the far 
end. Due to the position of traffic cameras and perspective 
projection, the far end of lanes is narrower than the front end. 
Therefore, we draw the outline of lanes which is similar to 
trapezoid. We then divide the trapezoid into several rectangles; 
each of these rectangles is called Region of interest (ROI). All 
the features are extracted within these ROI. See figure [2] for 
an example. 

IV. FEATURE SELECTION 

For a classier to work in really applications, multiple fea- 
tures are usually needed. AS the complexity of the classifiers 
depends on the dimension of the feature vector, we shall select 
those features that are most discriminant. What's more, the 
feature selection algorithms should be as simple as possible 
for maximally reducing the system overall complexity. During 
our experiments, we found that basic image features including 
local Entropy, edge and moment are best choice for our system 
to work in real-time. Since we are using Naive Bayes classifier, 
the ease of normalization of each feature has been taken into 
account when we choose these features. 

A. Maximum Local Entropy 

Entropy is a good measurement for texture and common 
used in object detection|4| |13|. It is defined sls E = 

l,p{i) >= 0,z = 



L L 

- Ep(^)log2P(0' where E P(0 

1, ..., L. The probability is naturally represented by histogram. 
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Fig. 3. Local entropy measurement of two images. The first column shows 
two different images containing the same vehicle fully and partially. The 
second column shows the location of maximum local entropy measurement, 
denoted by two red squares. Note that the two red squares are in the same 
position relative to the vehicle. The last column shows local measurement of 
the two images as gray scale images 



width and height of the image patch respectively, and !{•} 
is the indicator function. 

It is easy to find that the maximum entropy is achieved to 
log2(i^) if and only if p{i) = -^ for i = 1, ..., L, which means 
that scattered gray-levels lead to large entropy while a clean 
background has a zero entropy. It is therefore discriminative 
to use entropy as a feature. 

While entropy is a good measure of texture, applying 
entropy directly to a whole image will obscure its details. Two 
different images may have the same entropy value, while two 
images both contain the same object may have two different 
entropy values. For example, in figure [3] images of the first 
column both contain the same vehicle (although the first image 
has a clear view and the second one has only a partial view). 
The entropy of the first image is 5.5953 while the entropy 
of the second one is 6.2018. In order to best describe the 
similarity of these two images, we induce a local entropy 
measurement. 
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1} wherein; and h are 



l{I{j,k) = i} 

(2) 

2w' + 1 and 2h' ^ 1 are the window width and height 
respectively, w' and h' vary from 1 to 3 in different scenes 
and resolutions. The local entropy measurement is illustrated 
in Figure [3] 

B. Edge 

Unlike those tackle complete vehicle detection, we take 
edge as another measurement of texture here. Four different 
types of edge detection kernel are defined as follows 



-1 -2 -1 


1 






" -1 1 





, h2 = 


-2 2 


1 2 1 




-1 1 


-1 -1 " 




' -1 -1 


1 -1 


,/l4 = 


-1 1 


1 1 








1 1 



hi = 



h. 



hi and /12 are the well-known Sobel detectors, which detect 
horizontal edges respectively; hs and /14 detect left diagonal 
edges and right diagonal edges respectively. After convolving 
the above kernels with a gray-scale image, we obtain an 
edge image 1^^ The edge image is further processed by the 
operation : 



TABLE I 

An Example of a Table 
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C. Other measurements 

We further define three measurements. The first one is 
rate of non-zero histogram bins, or B, which is defined as 

L 

B = j^Yl l{p(0 > 0}- Another feature is normalized first 

L 

moment, which is defined as Mi = j^^i • p{i)- Miis simply 

i=l 

the mean of image intensity. The last feature is the second 
moment of image: 
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To normalize M-2, we need to find its maximum value. 

n n 

Another definition of M2 \s M2 = \Y. ^f - {lY. ^if = 



-*(p- 



p2 



1 ^ i = l 

X where b = [1, ..., 1]^ . We treat an image 
as a n X 1 vector and x is Xi its zth component. To find 
the maximum value of (11) is a quadratic programming 
subjected to 1 < Xi < L for i = 1, ...n. Without further 
derivation, we give the maximum value here directly, that 
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.Therefore, normalized version of equation (10) is 
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D. Selection criterion 

We test each of these measurement using two sets of 
samples which are manually labelled beforehand. The quality 
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of each feature is judged by Fishers criterion, J 
where/i^ and a^^ are mean and variance of feature value of 
class respectively. Large means large inter-class distance and 
small intra-class variation. All the feature measurements are 
tested on a training set containing 4000 vehicle images and 
2000 lane images according to Fishers criterion. The testing 
result is given in table 1, and Fig |4] shows the separating 
distance in the form of probability density function (PDF). 

Note that all features are normalized before being assembled 
as a feature vector. As we use Euclidean distance, non- 
normalized feature vector will lead to focus on larger scale 
feature component. 
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Fig. 4. The red curve and blue curve represent the distribution of each feature 
extracted from training samples which contain vehicles and those contain only 
background, respectively. These visualized distributions provide an intuition 
on which features are superior for classification. The title of each chart follows 
symbols given in table I. 



V. CLUSTERING AND ONLINE EM ALGORITHM 

A. Clustering 

We use the standard K-means clustering algorithm (H to 
coarsely estimate the cluster centroids which serves as an 
initial guess for the subsequent EM Clustering algorithm. 
Without discussing algorithm details, we give the K-means 
parameter values in our case here directly. L K=2, i.e. there are 
two clusters: vehicle and lane 2. Cluster distance is measured 
by Euclidean distance 3. Repeat the algorithm N times to avoid 
trapping into local maximums. (N=3 is sufficient) 

B. EMforGMM 

The output of K-means algorithm are two cluster cen- 
troids, as well as labels of data indicating which clus- 
ter they belong too. The two clusters are further mod- 
elled with Gaussian distribution, i.e. p(x|x G c; /ic, ^c) = 
_^ exp (-\{x - ^ef^jHx - A^c)) 

where c G {l^v} represents lane and vehicle respectively. 
X e R^ is the feature vector, jj^c ^ R^ is the mean vector, 
and Sc G 5^^ denotes the co variance matrix, which is 
positive definite. We further simplify the covariance matrix as 
a diagonal matrix to lower computation complexity without 
much loss of accuracy. 

Suppose that we have m feature vectors which we want 
to divide into two classes and model them with two separate 
Gaussian distributions. To put it another way, the problem is 
to find jic^ T^c and more accurately determine memberships 
of all data to the two models . By using the well-known EM 
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Fig. 5. Clustering result for the first two dimensions of feature vectors are 
demonstrated. The left one is k-means clustering result; The right one is EM 
clustering result. It can be seen that there are considerable difference between 
them. 



algorithm, we can easily solve this mixture models problem 
[5]. First, we define the class prior density as p{x e c) = ^c 
with T.c={i,v} ^c = 1 and ^^ > 0. 

1. E-step: For every i and c , estimate memberships accord- 
ing to 



UJ^ 



(i) 



= p(x(^)Gc|x^^);^e,/ic,Sc) 



—p{x^'^ \x^'^ e c; /ic, ^c)p{x^'^ e c; ^J (5) 



2. M-step: 
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Then, a common used EM-based GMM algorithm is listed 
here without proving it. 

In the above algorithm, x^^^ denotes the zth feature vector. 
D is denominator which normalizes ijj^ so that ^^^l =1 
for every i = 1, ..., m .The initial value for ^c is 0.5, for jj^c 
is the output of cluster centroids from k-means, and for Sc 
is just the identity matrix. A snapshot of k-means and EM 
clustering algorithm is taken from our system and showed in 
figure |5j which shows the cluster distribution of the first two 
components in the feature space. 

VI. ON-LINE EM 

Suppose that we already have samples in hand, and the 
GMM is well modeled accordingly. As the monitoring system 
runs, the environment (mostly the lighting condition) will 
gradually change. If no updating is made, the system probably 
will fail when the change becomes obvious. Therefore, an on- 
line learning phase is adopted to update the parameters of 
GMM. 

Derivation of the on-line EM algorithm is intuitive. Let 
^(^+1) denotes the incoming sample, following the notations 



of previous EM algorithm, we have the following on-line 
learning algorithm, 

L Estimate ujj^ according to ([5]). 

2. Update model parameters 

^ - ■ ,(^+1)^ (10) 
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One problem of On-line EM is that, when m is too large, 
Mc^m will be also too large that the incoming new sample 
barely influences the model parameters. To avoid this, we fix 
Uc /Mc^m = A = m/{m + 1) and then adjust the model 
updating equations accordingly. 

VII. BAYES CLASSIFIER 

Given a sample feature vector and two class models that 
have been trained, according to the Bayes rule. 
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p{x\x G c; Oc)p{x G c; Oc) 



(14) 



Y^^p{x\x e c]Oc)p{x e c;Oc) 

for c e {l^v}.p{x\x e c;Oc) = N{x; jj.cjYc)^s the likelihood 
term. Oc = [jJ^cj ^c, ^c] • p{x G c; Oc) = ^c is the prior density. 
A feature vector is said to be from class v if p{x G v\x;Oy) > 
p{x G l\x;Oi). 

Another form of ([14]) is the discriminate function f{x) = 
logp{x G v\x;Oy) — logp{x G l\x;Oi) > which is further 
simplified as 
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(15) 



Every incoming feature is classified to be vehicle or lane 
according to (T5\ . 

VIII. EXPERIMENTAL RESULT 

Although our video sources contain hue information, we 
utilized only the intensity channel. The image sequence are 
in size and 25FPS. Because high chronological preciseness 
is not necessary, the video rate is down-sampled to 5FPS, 
which lowers the consumption of computational resource 
tremendously. The values of aforementioned parameters are 
sum up in table [ll| 

IX. Conclusion 

In this paper, we have proposed a congestion estimation 
system which has wide application in urban traffic scenes. By 
using a traffic monitoring camera, it can effectively estimate 
the current traffic state, measure the length of vehicle queue 
and provide useful information for the traffic control depart- 
ments. We simplify the flow chart of our system by dividing 
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Fig. 6. Four different scenes are presented here. The red square means 
vehicle exists whereas the green square means otherwise. 



lanes into blocks, which eliminates the need for vehicle 
segmentation. However, it brings up another new challenge, 
that is, to detect partly visible objects. By incorporating GMIM 
and Bayesian into our system, we have successfully tackled 
the detection of vehicle existence, assuming that only lanes 
and vehicles are presented. In the future research, vehicle type 
recognition and pedestrian detection within lane blocks will be 
of great interest to us. 
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